Skip to content

FROST/ROAST readiness branch#3866

Draft
mswilkison wants to merge 183 commits into
mainfrom
feat/frost-schnorr-migration-scaffold
Draft

FROST/ROAST readiness branch#3866
mswilkison wants to merge 183 commits into
mainfrom
feat/frost-schnorr-migration-scaffold

Conversation

@mswilkison
Copy link
Copy Markdown
Contributor

@mswilkison mswilkison commented Feb 19, 2026

Current State (as of 2026-05-17)

This draft PR is the umbrella readiness branch for feat/frost-schnorr-migration-scaffold.
It is being kept current with main so it can become a direct merge target if the FROST/ROAST stack is approved for activation.

It remains in draft until the remaining phase-gate, governance, and cross-repository readiness items are closed.

Canonical Status Sources

  • Cross-repo migration tracker: docs/frost-migration/external-repository-tracking.md (in tlabs-xyz/tbtc)
  • Companion tBTC umbrella draft: https://github.com/tlabs-xyz/tbtc/pull/10
  • Latest readiness audit: docs/reviews/frost-roast-production-readiness-2026-05-16.md (in tlabs-xyz/tbtc)

Latest Refresh

  • Merged current main into this branch.
  • Local verification passed for the FROST signing package and tBTC signer backend paths, with and without frost_native.
  • Local verification also passed the native TBTC signer-path tests covering the FFI signing primitive and signing executor.

Remaining Cross-Repo Closure Items

  • Wait for CI from the latest refresh to complete.
  • Capture the first post-fix funded nightly live run artifact for Phase 4.
  • Record final approver signoff in the Phase 4 decision/packet docs.
  • Execute external org archive/redirect mapping and record results.

Notes

  • Keep this PR in draft until the activation decision is explicit.
  • Treat it as the readiness branch for the integrated keep-core side of the stack, not only a historical index.

@mswilkison mswilkison changed the title Draft: Add Schnorr/FROST migration scaffold package and RFC Draft: Add Schnorr/FROST scaffold and tBTC runtime signing adapter slice Feb 20, 2026
maclane added 26 commits February 20, 2026 09:30
mswilkison added 16 commits May 22, 2026 21:32
Adds the helper Phase 6.2 will use to derive AttemptSeed inputs
from NativeSignerMaterial. No consumer wired yet.

* pkg/frost/signing/dkg_group_pubkey_extraction.go (new, gated
  frost_native)
  - ExtractDkgGroupPublicKeyFromMaterial switches on
    SignerMaterial.Format and returns the canonical bytes that
    attempt.DeriveAttemptSeed consumes.
  - FrostUniFFIV2: hex-decode PublicKeyPackage.VerifyingKey
    (production materials use hex-encoded x-only output keys).
  - FrostTBTCSignerV1: use raw bytes of payload.KeyGroup; the
    tbtc-signer engine treats KeyGroup as the canonical handle for
    the FROST key group, so its bytes are deterministic across
    honest signers running the same tbtc-signer build.
  - FrostUniFFIV1: returns ErrUnsupportedSignerMaterialFormat
    with operator-guidance text directing migration to V2 or
    TBTCSignerV1 before enabling ROAST retry. RFC-21 Resolved
    Decision: Phase 7's manifest flip is gated on verified
    migration off V1.

Tests (10 cases in dkg_group_pubkey_extraction_test.go):

* RejectsNilMaterial
* FrostUniFFIV2_HexDecodes -- 32-byte canonical x-only key
* FrostUniFFIV2_RejectsEmptyVerifyingKey
* FrostUniFFIV2_RejectsNonHexVerifyingKey
* FrostTBTCSignerV1_ReturnsKeyGroupBytes
* FrostTBTCSignerV1_DeterministicAcrossCalls -- two consecutive
  calls produce byte-identical output
* FrostTBTCSignerV1_RejectsEmptyKeyGroup
* FrostUniFFIV1_ReturnsUnsupportedSentinel -- errors.Is sentinel +
  operator-guidance text mentioning the migration target formats
* UnknownFormat_ReturnsUnsupportedSentinel -- includes the bad
  format name in the error
* FrostUniFFIV2_GoldenFixture -- locks the hex-decode behaviour
  for a specific input

All pass under: go test -tags 'frost_native frost_tbtc_signer'
./pkg/frost/signing/..., go test -tags 'frost_native
frost_tbtc_signer frost_roast_retry' -race ./pkg/frost/...,
staticcheck -checks '-SA1019' ./pkg/frost/..., gofmt -l
./pkg/frost/signing/, go vet ./pkg/frost/....

Stacked on RFC update #3980. Phase 6.2 wires this helper into
BuildAttemptContextFromRequest, which Phase 6.3 then uses to
populate the orchestration call.

Cross-format note: Production signing groups must run on a single
uniform format. A UniFFIV2 hex-decoded key and a TBTCSignerV1 raw
KeyGroup byte string for the "same" logical group produce
different bytes; they are different formats. Mixed-format groups
are not supported and would silently desynchronise AttemptSeed
derivation. Phase 6.2's helper enforces this at the boundary.
Adds the bridge that converts a NativeExecutionFFISigningRequest
(legacy shape) into an attempt.AttemptContext (RFC-21 shape).
Phase 6.3 calls this from the executor adapter; Phase 6.4 may
use it from the migration call sites.

* pkg/frost/signing/attempt_context_from_request.go (new, gated
  frost_native)
  - BuildAttemptContextFromRequest(*NativeExecutionFFISigningRequest)
    returns (AttemptContext, error). Strict ordering: signer
    material is decoded BEFORE the AttemptContext is constructed,
    so an extraction failure surfaces a clean error rather than a
    half-built context (mitigation for Gemini's Phase-6 review
    hidden assumption).
  - Format-aware KeyGroupID derivation (per RFC-21 Resolved
    Decision):
      FrostUniFFIV2: HASH160(0x02 || xOnlyOutputKey) via
        frost.WalletPublicKeyHashCompatibilityAlias -- matches
        RFC-20's compatibility-alias scheme exactly.
      FrostTBTCSignerV1: the raw KeyGroup string from
        NativeTBTCSignerMaterialPayload -- the tbtc-signer engine
        treats it as the canonical per-group handle.
  - AttemptNumber is converted from keep-core's 1-based
    Attempt.Number to RFC-21's 0-based AttemptContext.AttemptNumber.
    Rejects Attempt.Number == 0 (must be >= 1).
  - TransientlyParked is empty: Phase 6 ships attempt-zero shape.
    Multi-attempt orchestration with parking metadata lands in
    Phase 7+.
  - messageDigestFromBigInt helper converts *big.Int message to
    the 32-byte canonical digest, left-padding short values.

Sentinel error: ErrAttemptContextConstruction wraps every
construction failure so callers distinguish it from runtime ROAST
errors via errors.Is. ErrUnsupportedSignerMaterialFormat from
PR 6.1 propagates through wrapped chains intact.

Tests (15 cases in attempt_context_from_request_test.go):

* UniFFIV2_HappyPath
* UniFFIV2_KeyGroupIDDerivation -- verifies HASH160 exactly via
  the reference function
* TBTCSignerV1_KeyGroupIDIsRawIdentifier
* RejectsNilRequest -- with sentinel
* RejectsNilMessage
* RejectsNilSignerMaterial
* RejectsNilAttempt
* RejectsZeroAttemptNumber
* PropagatesExtractionErrors -- ErrUnsupportedSignerMaterialFormat
  unwraps correctly even after ErrAttemptContextConstruction wraps
* AttemptNumberIsZeroBased (3 sub-cases: 1->0, 2->1, 5->4)
* DeterministicAcrossInvocations -- two calls with same request
  produce byte-identical AttemptContext hashes
* HashChangesWhenMessageDigestChanges
* HashChangesWhenIncludedSetChanges
* messageDigestFromBigInt: PadsShortBigInts
* messageDigestFromBigInt: RejectsLongBigInts
* SmokeTestSha256Length -- AttemptContext.MessageDigestLength
  matches sha256.Size

All pass under: go test -tags 'frost_native frost_tbtc_signer'
./pkg/frost/signing/..., go test -tags 'frost_native
frost_tbtc_signer frost_roast_retry' -race ./pkg/frost/...,
staticcheck -checks '-SA1019' ./pkg/frost/..., go vet
./pkg/frost/..., gofmt -l ./pkg/frost/signing/.

Stacked on Phase 6.1 (#3981). Phase 6.3 wires
BeginOrchestrationForSession into the executor adapter using this
helper.
…or adapter

Adds the entry-point helper that calls
BeginOrchestrationForSession from the
nativeExecutionFFIExecutorAdapter.Execute method, gated by the
frost_native build tag with a permanent default-build no-op stub.

Per the RFC-21 Phase-6 Resolved Decision on orchestration error
taxonomy (#3980):

  - BuildAttemptContextFromRequest failures are treated as STATIC
    fallbacks. They are per-input deterministic: the same
    NativeExecutionFFISigningRequest produces the same construction
    outcome on every honest node. Log at INFO and continue without
    orchestration.
  - BeginOrchestrationForSession failures matching
    ErrRoastRetryReadinessOptOut or
    ErrNoRoastRetryCoordinatorRegistered are STATIC fallbacks for
    the same reason (deterministic per deployment configuration).
  - Any other BeginOrchestrationForSession failure is a RUNTIME
    Coordinator state-machine error. HARD FAIL: return error from
    the executor adapter. The signing group must NOT have node A
    on legacy shuffle while node B is on ROAST state machine,
    which would fracture NextAttempt agreement.

New files:

* pkg/frost/signing/roast_retry_executor_entry_default_build.go
  (//go:build !frost_native)
  - attemptRoastRetryOrchestrationFromRequest permanent stub
    returning (nil, nil). The executor adapter compiles and runs
    in the default build with zero orchestration overhead.

* pkg/frost/signing/roast_retry_executor_entry_frost_native.go
  (//go:build frost_native)
  - Real implementation. Walks the (build context, begin, return
    cleanup) sequence with the error-classification discipline.
  - Defensive nil-logger handling so the existing executor-
    adapter tests (which pass nil) do not panic.

* pkg/frost/signing/roast_retry_orchestration.go (extended)
  - ErrNoRoastRetryCoordinatorRegistered sentinel.
  - BeginOrchestrationForSession wraps the sentinel via fmt.Errorf
    %w so callers can errors.Is it.

* pkg/frost/signing/native_ffi_executor_adapter.go (modified)
  - Execute now calls attemptRoastRetryOrchestrationFromRequest
    after building the FFI request, defers the cleanup if
    orchestration started, then proceeds to primitive.Sign as
    before.

Tests:

* roast_retry_executor_entry_test.go (default-build, 1 case)
  - Stub returns (nil, nil) for any input.

* roast_retry_executor_entry_frost_native_test.go (frost_native, 4
  cases)
  - Static fallback when no coordinator registered (default-build
    stub of RegisteredRoastRetryCoordinator returns false).
  - Static fallback for FrostUniFFIV1 (unsupported format).
  - Static fallback for nil signer material (deterministic
    precondition).
  - Static fallback for zero attempt number.

* roast_retry_executor_entry_frost_roast_retry_test.go
  (frost_native && frost_roast_retry, 4 cases)
  - Static fallback when readiness env var unset.
  - Static fallback when registry empty.
  - Happy path activates orchestration; binding exists; cleanup
    clears it.
  - HARD FAIL on synthetic runtime BeginAttempt error.

All pass under: go test ./pkg/frost/..., go test -tags
'frost_native frost_tbtc_signer frost_roast_retry' ./pkg/frost/...,
go test -race -tags 'frost_native frost_tbtc_signer
frost_roast_retry' ./pkg/frost/..., staticcheck -checks '-SA1019'
./pkg/frost/..., gofmt -l ./pkg/frost/signing/, go vet
./pkg/frost/....

Stacked on Phase 6.2 (#3982). Phase 6.4 will migrate the actual
participant-selection call sites to consume the
ROAST-coordinator-derived AttemptContext for retry decisions.
…spatcher

Closes Phase 6 of RFC-21 by abstracting the participant-selection
call site in pkg/tbtc/signing_loop.go behind a small dispatcher
interface. PR 6.4 installs the legacy implementation as the
default; Phase 7 will install the ROAST-driven implementation
alongside AggregateBundle production at the executor-adapter
layer.

The migration here is the *abstraction*, not a behavioural change.
Both default and frost_roast_retry builds today execute the same
legacy retry shuffle. The dispatcher exists so Phase 7 can replace
it without touching signing_loop.go's call shape.

* pkg/tbtc/signing_loop_roast_dispatcher.go (new, untagged)
  - signingParticipantSelector interface: single Select method
    matching the legacy shape, plus a sessionID parameter that
    Phase 7's ROAST-driven implementation will use to look up
    the most recent TransitionMessage.
  - defaultSigningParticipantSelector() returns the legacy impl.

* pkg/tbtc/signing_loop_legacy_selector.go (new, untagged)
  - legacySigningParticipantSelector: calls
    pkg/frost/retry.EvaluateRetryParticipantsForSigning verbatim.
  - Documented as the rollback path preserved through Phase 6 so
    the readiness env var can disable ROAST retry without
    deleting the legacy code (per the RFC-21 Phase-6 Resolved
    Decision on rollback preservation).

* pkg/tbtc/signing_loop.go (modified)
  - signingRetryLoop gains participantSelector field; default
    initialised in newSigningRetryLoop.
  - qualifiedOperatorsSet now calls srl.participantSelector.Select
    instead of retry.EvaluateRetryParticipantsForSigning directly.
  - pkg/frost/retry import removed (only the dispatcher's
    legacy implementation uses it now).

Tests (5 cases in signing_loop_roast_dispatcher_test.go):

* defaultSigningParticipantSelector returns the legacy impl
* legacy selector delegates to retry.EvaluateRetryParticipantsForSigning
* legacy selector propagates retry-shuffle errors
* signingRetryLoop routes through the dispatcher (recording
  selector verifies Select called exactly once and result is
  surfaced)
* selector errors propagate through signingRetryLoop

What Phase 7 will add:
- AggregateBundle production at the executor-adapter end (the
  elected coordinator's node generates a TransitionMessage at
  attempt completion).
- Per-session bundle registry so signing_loop can look up the
  most recent bundle for the message.
- ROAST-driven signingParticipantSelector that consumes the
  bundle via EvaluateRoastRetryForSigning and falls back to the
  legacy selector when no bundle is available.
- Readiness manifest flip once integration tests pass on a real
  testnet.

Verification:

* go build ./...                           -- clean
* go test ./pkg/tbtc/... -count=1          -- pass
* go test ./pkg/frost/... -count=1         -- pass
* staticcheck -checks '-SA1019' ./pkg/...  -- silent
* go vet ./pkg/...                         -- clean
* gofmt -l ./pkg/...                       -- silent

Pre-existing test failure note: TestNode_RunCoordinationLayer
fails under the 'frost_native frost_tbtc_signer frost_roast_retry'
tag combination on the integration tip *without* the Phase 6.4
changes (verified by checking out integration-tip's tbtc package
and re-running). Not introduced by this PR; tracked separately.

Stacked on Phase 6.3 (#3983). Closes the Phase 6 PR series.
…istry

Wires AggregateBundle production into the orchestration cleanup
path so the elected coordinator's node automatically produces a
TransitionMessage at the end of each attempt. The bundle is
stashed in a per-session registry that Phase 7.2's ROAST-driven
signingParticipantSelector reads to compute the next attempt's
IncludedSet.

* pkg/frost/signing/roast_retry_bundle_registry_default_build.go
  (//go:build !frost_roast_retry)
  - RecordTransitionBundleForSession, TransitionBundleForSession,
    ClearTransitionBundleForSession,
    ResetTransitionBundleRegistryForTest -- permanent no-op stubs.
    The default-build signing-loop selector therefore always
    sees "no bundle" and falls back to the legacy retry shuffle.

* pkg/frost/signing/roast_retry_bundle_registry_frost_roast_retry.go
  (//go:build frost_roast_retry)
  - Real implementation: sync.RWMutex-protected map; TTL matches
    SessionHandleBindingTTL (two hours).
  - sessionBundleEntry pairs bundle with createdAt for eviction.
  - evictStaleTransitionBundles helper for tests + Phase-7+
    sweeper integration.
  - Later Record-calls overwrite earlier ones (latest transition
    wins).
  - nil bundles silently discarded.

* pkg/frost/signing/roast_retry_orchestration.go (extended)
  - maybeProduceTransitionBundle helper called from the cleanup
    function returned by BeginOrchestrationForSession. The
    helper:
    1. Verifies the local node is the elected coordinator for
       the attempt (skip if not).
    2. Checks the attempt is still Collecting (skip if already
       transitioned -- e.g. signature succeeded, no bundle
       needed).
    3. Calls Coordinator.AggregateBundle.
    4. Stashes the result via RecordTransitionBundleForSession
       (a no-op in default build).
  - Failures along the path are silent: cleanup must never
    panic and must never propagate errors into the signing
    flow's defer chain. A missing bundle just means the next
    attempt's selector falls back to legacy.

Tests:

* roast_retry_bundle_registry_test.go (//go:build !frost_roast_retry, 1 case)
  - Default-build stub is observably no-op.

* roast_retry_bundle_registry_frost_roast_retry_test.go (//go:build
  frost_roast_retry, 5 cases)
  - Round-trip Record -> TransitionBundleForSession.
  - Later Record overwrites earlier (latest-wins).
  - Clear removes the bundle.
  - Nil bundles silently discarded.
  - evictStaleTransitionBundles removes old entries while
    preserving fresh ones.
  - TTL matches session-handle TTL (bundles must not outlive
    sessions).

* roast_retry_orchestration_bundle_test.go (//go:build
  frost_roast_retry, 3 cases)
  - Cleanup on elected coordinator records a non-nil bundle
    with the correct coordinator id after seeding evidence.
  - Cleanup on a non-elected coordinator does NOT record a
    bundle.
  - Double-cleanup is safe (second call sees Transitioned state
    and bails silently without panic).

All pass under: go test ./pkg/frost/..., go test -tags
'frost_roast_retry' ./pkg/frost/signing/..., staticcheck -checks
'-SA1019' ./pkg/frost/..., gofmt -l ./pkg/frost/signing/.

Stacked on Phase 6.4 (#3984). Phase 7.2 installs the ROAST-driven
signingParticipantSelector that consumes the bundle registry.
Installs the ROAST-driven selector as the build-default when the
frost_roast_retry tag is set, consuming the per-session bundle
registry from Phase 7.1 to compute the next attempt's IncludedSet
via EvaluateRoastRetryForSigning. Falls back to the legacy retry
shuffle whenever a precondition is missing (no bundle, no
registry, no session-handle binding).

* pkg/tbtc/signing_loop_selector_default_build.go (//go:build
  !frost_roast_retry)
  - defaultSigningParticipantSelector returns
    legacySigningParticipantSelector. Default-build binary
    contains no ROAST-retry code paths at all.

* pkg/tbtc/signing_loop_selector_frost_roast_retry.go (//go:build
  frost_roast_retry)
  - roastSigningParticipantSelector implements the dispatch:
      bundle absent?              -> legacy fallback
      registry empty?             -> legacy fallback
      no session-handle binding?  -> legacy fallback
      all preconditions met?      -> EvaluateRoastRetryForSigning
  - Errors from EvaluateRoastRetryForSigning (ErrAttemptInfeasible,
    resolver failure) are propagated unchanged per the RFC-21
    Phase-6 hard-fail error taxonomy. Falling back on these
    runtime errors would let one node use legacy retry while
    another uses ROAST -- the signing group would fracture on
    NextAttempt agreement.
  - membersResolver closure maps group.MemberIndex (1-based) to
    chain.Address via the supplied members slice. Validates
    zero and out-of-range inputs.
  - defaultSigningParticipantSelector returns the ROAST selector;
    its first action is to check for bundle availability and
    delegate to the legacy selector when absent.

* pkg/frost/signing/roast_retry_attempt_handle_*.go (extended)
  - Public CurrentAttemptHandleForSession wrapper around the
    unexported currentAttemptHandleForCollect so the ROAST
    selector in pkg/tbtc can read the handle. Default-build
    stub returns ok=false; tagged build returns the real
    binding.

Tests (8 cases across two build configurations):

* pkg/tbtc/signing_loop_selector_default_build_test.go
  (//go:build !frost_roast_retry, 1 case)
  - Default-build defaultSigningParticipantSelector returns
    legacySigningParticipantSelector.

* pkg/tbtc/signing_loop_selector_frost_roast_retry_test.go
  (//go:build frost_roast_retry, 7 cases)
  - Tagged-build default is roastSigningParticipantSelector.
  - Bundle absent -> legacy fallback succeeds.
  - Registry empty (bundle recorded but no coordinator
    registered) -> legacy fallback.
  - No session-handle binding -> legacy fallback.
  - membersResolver maps index -> address correctly.
  - membersResolver rejects zero index.
  - membersResolver rejects out-of-range index.
  - End-to-end happy path: register coordinator, bind session,
    seed snapshots, aggregate bundle, record bundle, Select
    returns a non-empty address slice via the ROAST path.

All pass under: go test ./pkg/tbtc/..., go test ./pkg/frost/...,
go test -tags 'frost_roast_retry' ./pkg/tbtc/...
./pkg/frost/signing/..., staticcheck -checks '-SA1019'
./pkg/tbtc/... ./pkg/frost/..., gofmt -l ./pkg/tbtc/
./pkg/frost/signing/, go vet ./pkg/tbtc/... ./pkg/frost/....

Stacked on Phase 7.1 (#3985). With this PR, the
frost_roast_retry-tagged build executes the full
ROAST coordinator-driven retry path end-to-end when (a) the
operator opt-in env var is set, (b) a coordinator is registered,
and (c) a session has progressed past attempt 1 (so a transition
bundle exists). Default builds and tagged builds without
preconditions met still execute the legacy retry shuffle, so the
behavioural rollback path is intact.
Operational documentation describing how to enable the
ROAST-driven retry path in production deployments. Captures the
three activation prerequisites (build tag, env var, coordinator
registration), the behavioural matrix across configurations, the
RFC-21 Phase-6 error-handling discipline (static vs runtime
errors), and the recommended rollout sequencing.

Cross-references every file the multi-phase RFC-21 implementation
touched so operators can trace behaviour back to the responsible
package.

The readiness manifest itself (the cross-repo evidence ledger
that gates production enablement) lives in the tlabs-xyz/tbtc
monorepo's docs/operations/ directory, not in keep-core. This
document is the keep-core-side operational guide; the manifest
is the operational gate.

Doc-only; no code changes.
…#3980)

## Summary

Two new Resolved Decisions in RFC-21, informed by the Phase-6
design review (2026-05-23). Doc-only; +59/-0.

### 1. Orchestration error taxonomy

The orchestration call from the executor adapter into
\`BeginOrchestrationForSession\` can fail for two fundamentally
different reasons that **must not** be collapsed into a single
"fall back to legacy" path:

| Class | Source | Action |
|---|---|---|
| Static-configuration | env var unset, no coordinator registered | Log
INFO, fall back to legacy. Deterministic across nodes -- every honest
signer observes the same outcome. |
| Runtime state-machine | \`Coordinator.BeginAttempt\` failure, internal
invariant violated | **HARD FAIL.** Return error from executor adapter;
declare session failed. |

The decision is load-bearing for safety. A fall-back-on-runtime-error
policy lets node A run the legacy shuffle while node B proceeds
with the ROAST state machine, splitting the signing group on
\`NextAttempt\`. Sentinel errors \`ErrRoastRetryReadinessOptOut\` and
\`ErrNoRoastRetryCoordinatorRegistered\` (introduced in Phase 6.3)
identify the static class via \`errors.Is\`.

### 2. \`FrostUniFFIV1\` signer-material prerequisite

Phase 6.1's \`ExtractDkgGroupPublicKeyFromMaterial\` cannot extract
the DKG group public key from V1 material. The Phase 7 readiness
manifest flip is therefore gated on verified migration off V1
across production signers.

The migration tracking mechanism is out of scope for this RFC;
the prerequisite is documented here as a hard dependency of
Phase 7.

## Phase 6 implementation plan (updated)

Adapter wiring uses the new error taxonomy:

\`\`\`go
handle, cleanup, err := BeginOrchestrationForSession(sid, ctx)
switch {
case errors.Is(err, ErrRoastRetryReadinessOptOut),
     errors.Is(err, ErrNoRoastRetryCoordinatorRegistered):
    // Static config: legacy fallback is safe
    log.Infof("ROAST retry disabled: %v", err)
    return legacyParticipantSelection(...)
case err != nil:
    // Runtime error: HARD FAIL, do not fall back
    return nil, fmt.Errorf("orchestration failed: %w", err)
}
defer cleanup()
// Use ROAST path
\`\`\`

## Test plan

- [ ] Reviewer confirms the static-vs-runtime taxonomy is the right
safety partition.
- [ ] Reviewer confirms the V1 prerequisite is appropriately scoped to
Phase 7 (not earlier).

No code changes; AsciiDoc renders cleanly via the existing docs CI job.
…ion (#3981)

## Summary

First Phase-6 implementation PR. Adds the helper Phase 6.2 will use
to derive \`AttemptSeed\` inputs from \`NativeSignerMaterial\`. No
consumer wired yet.

Stacked on #3980 (RFC update).

## What lands

\`pkg/frost/signing/dkg_group_pubkey_extraction.go\` (new, gated
\`frost_native\`):

| Format | Extraction |
|---|---|
| \`FrostUniFFIV2\` | hex-decode \`PublicKeyPackage.VerifyingKey\`
(production materials use hex-encoded x-only output keys) |
| \`FrostTBTCSignerV1\` | raw bytes of \`payload.KeyGroup\` --
tbtc-signer engine's canonical handle for the FROST key group |
| \`FrostUniFFIV1\` | returns \`ErrUnsupportedSignerMaterialFormat\`
with operator-guidance text; RFC-21 Resolved Decision says Phase 7's
manifest flip is gated on migration off V1 |
| Unknown | sentinel error with bad-format-name context |

## Why two different extractions

UniFFIV2 carries the raw cryptographic group public key on the
material (hex-encoded). TBTCSignerV1 carries an opaque
\`KeyGroup\` string identifier -- the actual public key lives in the
tbtc-signer engine and is referenced by KeyGroup. Both
representations are deterministic *within* a format, so two honest
signers running the same format and material derive the same
\`AttemptSeed\`.

**Important constraint:** production signing groups must run on a
single uniform format. A UniFFIV2 hex-decoded key and a
TBTCSignerV1 raw KeyGroup byte string for the "same" logical group
produce different bytes -- they are different formats. Mixed-format
groups would silently desynchronise \`AttemptSeed\` derivation;
they are not supported. Phase 6.2's helper enforces this at the
boundary.

## Test coverage (10 cases)

- Rejects nil material
- UniFFIV2 hex-decodes to 32-byte canonical x-only key
- UniFFIV2 rejects empty VerifyingKey
- UniFFIV2 rejects non-hex VerifyingKey
- TBTCSignerV1 returns KeyGroup bytes
- TBTCSignerV1 deterministic across calls
- TBTCSignerV1 rejects empty KeyGroup
- UniFFIV1 returns \`ErrUnsupportedSignerMaterialFormat\` (errors.Is
sentinel + migration-guidance text)
- Unknown format returns sentinel with bad-format-name context
- UniFFIV2 golden-fixture locks the hex-decode behaviour

## Verification

| Command | Result |
|---|---|
| \`go build ./...\` | clean |
| \`go test ./pkg/frost/signing/...\` | pass |
| \`go test -tags 'frost_native frost_tbtc_signer frost_roast_retry'
-race ./pkg/frost/...\` | pass (5 packages) |
| \`staticcheck -checks '-SA1019' ./pkg/frost/...\` | silent |
| \`go vet ./pkg/frost/...\` | clean |
| \`gofmt -l ./pkg/frost/signing/\` | silent |

## Phase 6 plan

| PR | Scope | State |
|---|---|---|
| **6.1 (this)** | **DKG group-public-key extraction** | **open** |
| 6.2 | \`BuildAttemptContextFromRequest\` helper | next |
| 6.3 | Wire orchestration at executor adapter (strict error taxonomy
per #3980) | after 6.2 |
| 6.4 | Migrate three call sites onto \`EvaluateRoastRetryForSigning\` |
after 6.3 |

## Test plan

- [ ] CI green.
- [ ] Reviewer confirms the UniFFIV1-as-error decision (alternative:
silent-ignore for graceful degradation).
- [ ] Reviewer confirms the format-bytes-as-canonical approach
(alternative: derive a uniform representation across formats).
…st (#3982)

## Summary

Second Phase-6 PR. Adds the bridge that converts a
\`NativeExecutionFFISigningRequest\` (legacy shape) into an
\`attempt.AttemptContext\` (RFC-21 shape). Phase 6.3 calls this from
the executor adapter; Phase 6.4 may use it from migration call
sites.

Stacked on #3981 (Phase 6.1).

## What lands

\`pkg/frost/signing/attempt_context_from_request.go\` (new, gated
\`frost_native\`):

| Surface | Notes |
|---|---|
| \`BuildAttemptContextFromRequest(*NativeExecutionFFISigningRequest)\`
| Returns \`(AttemptContext, error)\`. Strict ordering: decodes signer
material BEFORE constructing AttemptContext, surfacing extraction
failures cleanly rather than producing half-built contexts (Gemini's
Phase-6 review concern). |
| Format-aware \`KeyGroupID\` derivation | UniFFIV2: \`HASH160(0x02 \|\|
xOnlyOutputKey)\` via \`frost.WalletPublicKeyHashCompatibilityAlias\`.
TBTCSignerV1: raw \`KeyGroup\` string from payload. |
| 1-based → 0-based attempt-number conversion | keep-core's
\`Attempt.Number\` is 1-based; RFC-21's \`AttemptContext.AttemptNumber\`
is 0-based. Rejects \`Number == 0\`. |
| \`messageDigestFromBigInt\` helper | converts \`*big.Int\` to 32-byte
canonical digest, left-padding short values, rejecting >32 bytes |
| \`ErrAttemptContextConstruction\` sentinel | wraps every construction
failure; distinguishable from runtime ROAST errors via \`errors.Is\` |

## Test coverage (15 cases)

- UniFFIV2 happy path
- UniFFIV2 KeyGroupID exactly matches \`HASH160(0x02 \|\| key)\` via
reference function
- TBTCSignerV1 KeyGroupID is raw identifier
- Rejects nil request, nil message, nil signer material, nil attempt
- Rejects \`Attempt.Number == 0\`
- Propagates \`ErrUnsupportedSignerMaterialFormat\` unchanged through
the construction wrapper
- Attempt-number conversion (1→0, 2→1, 5→4)
- Deterministic across invocations (same request → same hash)
- Hash changes when message digest changes
- Hash changes when included set changes
- Digest padding for short big.Int values
- Digest rejection for >32-byte big.Int values
- AttemptContext digest length matches \`sha256.Size\`

## Phase 6 status

| PR | Scope | State |
|---|---|---|
| 6.1 (#3981) | DKG group-public-key extraction | open |
| **6.2 (this)** | **BuildAttemptContextFromRequest** | **open** |
| 6.3 | Wire orchestration at executor adapter (strict error taxonomy
from #3980) | next |
| 6.4 | Migrate three call sites onto \`EvaluateRoastRetryForSigning\` |
after 6.3 |

## Verification

| Command | Result |
|---|---|
| \`go build ./...\` | clean |
| \`go test -tags 'frost_native frost_tbtc_signer'
./pkg/frost/signing/...\` | pass |
| \`go test -tags 'frost_native frost_tbtc_signer frost_roast_retry'
-race ./pkg/frost/...\` | pass (5 packages) |
| \`staticcheck -checks '-SA1019' ./pkg/frost/...\` | silent |
| \`go vet ./pkg/frost/...\` | clean |
| \`gofmt -l ./pkg/frost/signing/\` | silent |

## Test plan

- [ ] CI green.
- [ ] Reviewer confirms the format-aware KeyGroupID derivation matches
the RFC-21 Resolved Decision intent.
- [ ] Reviewer confirms the strict ordering (extract first, construct
second) is the right defence against half-built contexts.
…or adapter (#3983)

## Summary

Third Phase-6 PR. Wires \`BeginOrchestrationForSession\` into the
\`nativeExecutionFFIExecutorAdapter.Execute\` method with the
strict error-handling discipline from RFC-21 (#3980).

Stacked on #3982 (Phase 6.2).

## Error taxonomy implemented

| Source | Class | Action |
|---|---|---|
| \`BuildAttemptContextFromRequest\` failure (any reason) | Static --
deterministic per request | Log INFO, fall back (no orchestration) |
| \`ErrRoastRetryReadinessOptOut\` | Static -- deterministic per env var
| Log INFO, fall back |
| \`ErrNoRoastRetryCoordinatorRegistered\` | Static -- deterministic per
registration | Log INFO, fall back |
| Any other \`BeginOrchestrationForSession\` error | **Runtime** --
non-deterministic across nodes | **HARD FAIL** |

The hard-fail discipline is load-bearing for safety. A
fall-back-on-runtime-error policy lets node A run the legacy
shuffle while node B proceeds with the ROAST state machine,
splitting the signing group on \`NextAttempt\`. Gemini's Phase-6
review flagged this as a critical risk; this PR is the
implementation of the resolution.

## What lands

| File | Build tag | Role |
|---|---|---|
| \`roast_retry_executor_entry_default_build.go\` | \`!frost_native\` |
Permanent stub returning \`(nil, nil)\`. Executor adapter compiles +
runs with zero orchestration overhead in default builds. |
| \`roast_retry_executor_entry_frost_native.go\` | \`frost_native\` |
Real implementation walking (build context, begin, return cleanup) with
error classification. Defensive nil-logger handling. |
| \`roast_retry_orchestration.go\` (extended) | untagged | Adds
\`ErrNoRoastRetryCoordinatorRegistered\` sentinel;
\`BeginOrchestrationForSession\` wraps it via \`fmt.Errorf %w\` |
| \`native_ffi_executor_adapter.go\` (modified) | untagged | \`Execute\`
calls the entry helper after building the FFI request, defers cleanup,
then proceeds to \`primitive.Sign\` |

## Why \`BuildAttemptContextFromRequest\` failures are STATIC

Even though they look like "runtime" errors (nil fields, zero
attempt numbers, etc.), they are **per-input deterministic**:
the same \`NativeExecutionFFISigningRequest\` produces the same
construction outcome on every honest node. Two honest nodes can
only disagree on construction success if they receive different
requests, which is an upstream-orchestrator bug rather than a
ROAST concern. Falling back to legacy in this case preserves
liveness without splitting the signing group.

Coordinator state-machine errors (BeginAttempt OOM, internal
invariant violations) are genuinely non-deterministic per-node
and therefore must hard-fail.

## Test coverage

| File | Build | Cases |
|---|---|---|
| \`roast_retry_executor_entry_test.go\` | default | 1 (stub returns
(nil, nil) for any input) |
| \`roast_retry_executor_entry_frost_native_test.go\` | \`frost_native\`
| 4 (no coordinator registered; FrostUniFFIV1; nil signer material; zero
attempt number -- all static fallbacks) |
| \`roast_retry_executor_entry_frost_roast_retry_test.go\` |
\`frost_native && frost_roast_retry\` | 4 (env var unset; registry
empty; happy path; **HARD FAIL on runtime BeginAttempt error**) |

## Verification

| Command | Result |
|---|---|
| \`go build ./...\` | clean |
| \`go test ./pkg/frost/...\` | pass (default build) |
| \`go test -tags 'frost_native frost_tbtc_signer frost_roast_retry'
./pkg/frost/...\` | pass |
| \`go test -race -tags 'frost_native frost_tbtc_signer
frost_roast_retry' ./pkg/frost/...\` | pass |
| \`staticcheck -checks '-SA1019' ./pkg/frost/...\` | silent |
| \`go vet ./pkg/frost/...\` | clean |
| \`gofmt -l ./pkg/frost/signing/\` | silent |

## Phase 6 status

| PR | Scope | State |
|---|---|---|
| 6.1 (#3981) | DKG group-public-key extraction | open |
| 6.2 (#3982) | BuildAttemptContextFromRequest | open |
| **6.3 (this)** | **Wire orchestration at executor adapter** | **open**
|
| 6.4 | Migrate three signing call sites onto
EvaluateRoastRetryForSigning | next |

## Test plan

- [ ] CI green.
- [ ] Reviewer confirms the static-vs-runtime classification matches the
RFC-21 Phase-6 discipline.
- [ ] Reviewer confirms the defensive nil-logger fallback is acceptable
(alternative: require all callers to pass a real logger).
…spatcher (#3984)

## Summary

**Closes Phase 6 of RFC-21.** Abstracts the participant-selection
call site in \`pkg/tbtc/signing_loop.go\` behind a small dispatcher
interface. The legacy implementation is installed as the default;
Phase 7 will install the ROAST-driven implementation alongside
AggregateBundle production.

**The migration here is the *abstraction*, not a behavioural
change.** Both default and \`frost_roast_retry\` builds today execute
the same legacy retry shuffle. The dispatcher exists so Phase 7
can replace it without touching the call shape.

Stacked on #3983 (Phase 6.3).

## Why this scope

During implementation, the participant-migration target turned out
to require two pieces that aren't fully wired yet:

1. AggregateBundle production at attempt-completion time (on the
   elected coordinator's node).
2. A per-session bundle registry so \`signing_loop\` can find the
   most recent \`TransitionMessage\` for a given message.

Both are Phase 7 work. PR 6.4 ships the **dispatcher abstraction**
that lets Phase 7 slot the ROAST implementation in without
touching \`signing_loop.go\` itself, plus the legacy implementation
as the operational fallback that the readiness env var disables.

## What lands

| File | Role |
|---|---|
| \`pkg/tbtc/signing_loop_roast_dispatcher.go\` (new) |
\`signingParticipantSelector\` interface (\`Select(members, seed,
retryCount, honestThreshold, sessionID) → addresses\`).
\`defaultSigningParticipantSelector()\` returns the legacy impl. |
| \`pkg/tbtc/signing_loop_legacy_selector.go\` (new) |
\`legacySigningParticipantSelector\` -- byte-identical call to
\`retry.EvaluateRetryParticipantsForSigning\`. Documented as the
rollback path through Phase 6. |
| \`pkg/tbtc/signing_loop.go\` (modified) | \`signingRetryLoop\` gains
\`participantSelector\` field; \`qualifiedOperatorsSet\` routes through
it. \`pkg/frost/retry\` import removed (only the legacy selector uses it
now). |

## Test coverage (5 cases)

- \`defaultSigningParticipantSelector\` returns the legacy impl
- legacy selector delegates to retry shuffle
- legacy selector propagates retry-shuffle errors
- \`signingRetryLoop.qualifiedOperatorsSet\` routes through the
dispatcher (recording selector verifies)
- selector errors propagate through \`signingRetryLoop\` with
\`errors.Is\` preserving the sentinel

## What Phase 7 will add

- AggregateBundle production at the executor-adapter end (the elected
coordinator's node generates a \`TransitionMessage\` at attempt
completion)
- Per-session bundle registry so \`signing_loop\` can look up the most
recent bundle for the message
- ROAST-driven \`signingParticipantSelector\` that consumes the bundle
via \`EvaluateRoastRetryForSigning\` and falls back to the legacy
selector when no bundle is available
- Readiness manifest flip once integration tests pass on a real testnet

## Pre-existing test failure note

\`TestNode_RunCoordinationLayer\` fails under the
\`frost_native frost_tbtc_signer frost_roast_retry\` tag combination
on the integration tip *without* the Phase 6.4 changes (verified
by checking out integration-tip's tbtc package and re-running).

**Not introduced by this PR.** Tracked separately. Default-build
\`go test ./pkg/tbtc/... ./pkg/frost/...\` (149s) passes cleanly.

## Verification

| Command | Result |
|---|---|
| \`go build ./...\` | clean |
| \`go test ./pkg/tbtc/... -count=1\` | pass (149s default build) |
| \`go test ./pkg/frost/... -count=1\` | pass |
| \`staticcheck -checks '-SA1019' ./pkg/tbtc/...\` | silent |
| \`go vet ./pkg/tbtc/...\` | clean |
| \`gofmt -l ./pkg/tbtc/\` | silent |

## Phase 6 complete

| PR | Scope | State |
|---|---|---|
| 6.1 (#3981) | DKG group-public-key extraction | open |
| 6.2 (#3982) | BuildAttemptContextFromRequest | open |
| 6.3 (#3983) | Wire orchestration at executor adapter | open |
| **6.4 (this)** | **signing-loop participant-selection dispatcher** |
**open** |

Phase 7: AggregateBundle wiring + ROAST selector + manifest flip.

## Test plan

- [ ] CI green (default build).
- [ ] Reviewer confirms the dispatcher abstraction is the right
granularity (alternative: function-pointer indirection without an
interface).
- [ ] Reviewer confirms deferring participant migration to Phase 7 is
acceptable. The trade-off: smaller Phase 6 PRs but Phase 7 also covers
the AggregateBundle wiring + ROAST selector + manifest flip.
…istry (#3985)

## Summary

First Phase-7 PR. Wires \`AggregateBundle\` production into the
orchestration cleanup path so the elected coordinator's node
automatically produces a \`TransitionMessage\` at the end of each
attempt. The bundle is stashed in a per-session registry that
Phase 7.2's ROAST-driven \`signingParticipantSelector\` reads to
compute the next attempt's \`IncludedSet\`.

Stacked on #3984 (Phase 6.4).

## What lands

| File | Build tag | Role |
|---|---|---|
| \`roast_retry_bundle_registry_default_build.go\` |
\`!frost_roast_retry\` | Permanent no-op stubs. Default-build selector
always falls back to legacy. |
| \`roast_retry_bundle_registry_frost_roast_retry.go\` |
\`frost_roast_retry\` | Real mutex-protected map. TTL matches
\`SessionHandleBindingTTL\` (2h). Later Record-calls overwrite (latest
transition wins). |
| \`roast_retry_orchestration.go\` (extended) | untagged | New
\`maybeProduceTransitionBundle\` helper called from cleanup. |

## How cleanup produces a bundle

After \`BeginOrchestrationForSession\` returns, the deferred cleanup
fires at session end. It:

1. Verifies the local node is the elected coordinator (skip if not).
2. Checks the attempt is still \`Collecting\` (skip if already
transitioned -- e.g. signature succeeded, no bundle needed).
3. Calls \`Coordinator.AggregateBundle\`.
4. Stashes the result via \`RecordTransitionBundleForSession\` (no-op
stub in default build).

**Failures along the path are silent.** Cleanup must never panic
and must never propagate errors into the signing flow's defer
chain. A missing bundle just means the next attempt's selector
falls back to legacy.

## Test coverage

| File | Build | Cases |
|---|---|---|
| \`roast_retry_bundle_registry_test.go\` | \`!frost_roast_retry\` | 1
(default stub is observable no-op) |
| \`roast_retry_bundle_registry_frost_roast_retry_test.go\` |
\`frost_roast_retry\` | 5 (round-trip, latest-wins, clear, nil-discard,
TTL eviction, TTL matches session-handle TTL) |
| \`roast_retry_orchestration_bundle_test.go\` | \`frost_roast_retry\` |
3 (elected coordinator records, non-elected does not, double-cleanup is
safe) |

## Verification

| Command | Result |
|---|---|
| \`go build ./...\` | clean |
| \`go test ./pkg/frost/...\` | pass (5 packages) |
| \`go test -tags 'frost_roast_retry' ./pkg/frost/signing/...\` | pass |
| \`staticcheck -checks '-SA1019' ./pkg/frost/...\` | silent |
| \`gofmt -l ./pkg/frost/signing/\` | silent |

## Phase 7 plan

| PR | Scope | State |
|---|---|---|
| **7.1 (this)** | **AggregateBundle + bundle registry** | **open** |
| 7.2 | ROAST-driven signingParticipantSelector (consumes registry) |
next |
| 7.3+ | Readiness manifest entry + integration testnet evidence +
manifest flip | post-7.2 |

## Test plan

- [ ] CI green.
- [ ] Reviewer confirms the silent-error discipline in cleanup is
appropriate (alternative: log at WARN level).
- [ ] Reviewer confirms the TTL = session-handle TTL alignment is
intentional (alternative: longer-lived bundles).
…or (#3986)

## Summary

Second Phase-7 PR. Installs the ROAST-driven
\`signingParticipantSelector\`
as the build-default when the \`frost_roast_retry\` tag is set,
consuming the per-session bundle registry from Phase 7.1
(#3985) to compute the next attempt's \`IncludedSet\` via
\`EvaluateRoastRetryForSigning\`. Falls back to the legacy retry
shuffle whenever a precondition is missing.

Stacked on #3985 (Phase 7.1).

## Dispatch table

| Precondition | Action |
|---|---|
| Bundle absent for session | Legacy fallback |
| ROAST-retry registry empty | Legacy fallback |
| No session-handle binding for the attempt | Legacy fallback |
| All preconditions met | \`EvaluateRoastRetryForSigning\` |

Errors from \`EvaluateRoastRetryForSigning\` (\`ErrAttemptInfeasible\`,
resolver failures) are **propagated unchanged** per the RFC-21
Phase-6 hard-fail error taxonomy. Falling back on these runtime
errors would let one node use legacy retry while another uses
ROAST -- the signing group would fracture on \`NextAttempt\`
agreement.

## What lands

| File | Build tag | Role |
|---|---|---|
| \`signing_loop_selector_default_build.go\` | \`!frost_roast_retry\` |
\`defaultSigningParticipantSelector\` returns the legacy implementation;
the binary contains no ROAST-retry code paths. |
| \`signing_loop_selector_frost_roast_retry.go\` | \`frost_roast_retry\`
| \`roastSigningParticipantSelector\` implements the dispatch table
above. \`membersResolver\` closure maps 1-based \`group.MemberIndex\` to
\`chain.Address\`. |
| \`roast_retry_attempt_handle_*.go\` (extended) | both | Public
\`CurrentAttemptHandleForSession\` wrapper so the pkg/tbtc selector can
read the binding. |

## Test coverage (8 cases)

| File | Build | Cases |
|---|---|---|
| \`signing_loop_selector_default_build_test.go\` |
\`!frost_roast_retry\` | 1 (default returns legacy) |
| \`signing_loop_selector_frost_roast_retry_test.go\` |
\`frost_roast_retry\` | 7 (default returns ROAST; three legacy-fallback
scenarios; \`membersResolver\` happy + zero-index + out-of-range;
**end-to-end happy path** running register → bind → record snapshots →
aggregate → store bundle → select returns ROAST-derived addresses) |

## Verification

| Command | Result |
|---|---|
| \`go build ./...\` | clean |
| \`go test ./pkg/tbtc/... ./pkg/frost/...\` | pass |
| \`go test -tags 'frost_roast_retry' ./pkg/tbtc/...
./pkg/frost/signing/...\` | pass |
| \`staticcheck -checks '-SA1019' ./pkg/tbtc/... ./pkg/frost/...\` |
silent |
| \`go vet ./pkg/tbtc/... ./pkg/frost/...\` | clean |
| \`gofmt -l ./pkg/tbtc/ ./pkg/frost/signing/\` | silent |

## Operational consequences

With this PR, the \`frost_roast_retry\`-tagged build executes the
full ROAST coordinator-driven retry path end-to-end when:

1. The operator opt-in env var (\`KEEP_CORE_FROST_ROAST_RETRY_ENABLED\`)
is set.
2. A coordinator is registered via \`RegisterRoastRetryCoordinator\`.
3. A session has progressed past attempt 1 (so a transition bundle
exists from Phase 7.1's cleanup hook).

Default builds and tagged builds without preconditions met still
execute the legacy retry shuffle, so the **behavioural rollback
path is intact**.

## Phase 7 status

| PR | Scope | State |
|---|---|---|
| 7.1 (#3985) | AggregateBundle + bundle registry | open |
| **7.2 (this)** | **ROAST-driven signingParticipantSelector** |
**open** |
| 7.3+ | Readiness manifest entry + integration testnet evidence +
manifest flip | post-7.2 |

## Test plan

- [ ] CI green.
- [ ] Reviewer confirms the hard-fail-on-runtime-error /
fallback-on-static-precondition dispatch matches the RFC-21 discipline.
- [ ] Reviewer confirms passing \`nil\` for the DKG public key in
\`EvaluateRoastRetryForSigning\` is acceptable (the bundle's attempt
context carries the seed binding; \`NextAttempt\` uses the provided pub
key only when constructing the *next* context).
## Summary

Operational documentation for the ROAST-driven retry path
introduced across RFC-21 Phases 1-7. Intended for node operators
and release engineers planning a rollout of the new retry
semantics.

Doc-only; no code changes; ~130 lines AsciiDoc.

Stacked on #3986 (Phase 7.2).

## What it covers

- **Activation prerequisites:** build tag + env var + coordinator
registration
- **Behavioural matrix:** all 5 combinations of build tag / env var /
registry / bundle and what each one runs
- **Error handling discipline:** static-vs-runtime taxonomy per the
RFC-21 Phase-6 resolution
- **Production rollout sequencing:** build with tag → verify V1
migration → stage operator opt-in → monitor → roll back via env var
- **Cross-references:** every file the multi-phase implementation
touched

## What it doesn't include

The **readiness manifest itself** -- the cross-repo evidence
ledger that gates production enablement -- lives in the
\`tlabs-xyz/tbtc\` monorepo's \`docs/operations/\` directory, not in
keep-core. This document is the keep-core-side operational
guide; the manifest is the operational gate. The manifest stays
in \`missing-no-go\` until real testnet evidence is attached
(per the standing constraint that readiness manifests are
machine-checked evidence, not aspirational documents).

## Phase 7 status

| PR | Scope | State |
|---|---|---|
| 7.1 (#3985) | AggregateBundle + bundle registry | open |
| 7.2 (#3986) | ROAST-driven signingParticipantSelector | open |
| **7.3 (this)** | **Operational rollout guide (docs)** | **open** |

The remaining Phase-7 work is *not in code*: integration testnet
evidence, manifest flip in the cross-repo monorepo, and
post-rollout build-tag removal (optional). Those happen on the
operations side, not in this repository.

## Test plan

- [ ] CI green (AsciiDoc renders cleanly via the docs build).
- [ ] Reviewer confirms the prerequisites and behavioural matrix match
the actual code paths.
- [ ] Reviewer confirms the cross-references are accurate.
Closes the M4 gap from the original PR #3866 review by adding the
two evidence categories the RFC-21 Phase-2 work left as future
work: validation-rejection evidence and first-write-wins-conflict
evidence. With this PR, the NextAttempt policy can permanently
exclude misbehaving peers on all four ROAST blame channels --
transport-overflow, validation-reject, equivocation-conflict, and
silence -- instead of just overflow + silence.

Why this matters: a peer that only sends malformed messages
(validation rejects, never overflows the channel) was previously
indistinguishable from a silent peer. The transient silence-
parking policy would bench-and-reinstate them indefinitely, never
permanently excluding the malicious behaviour. Same for a peer
equivocating mid-attempt: the existing first-write-wins assembly
correctly dropped the conflicting retransmission but only logged
the event -- the bundle carried no structured evidence the
coordinator's policy could act on.

* pkg/frost/roast/attempt/evidence_recorder.go
  - EvidenceRecorder interface gains RecordReject(sender, reason)
    and RecordConflict(sender).
  - RejectQuotaDefault = 8, ConflictQuotaDefault = 4 (matches
    categoryQuota in RFC-21 Layer A).
  - Evidence struct extended with Rejects
    (map[MemberIndex][]RejectEntry: per-(sender, reason)) and
    Conflicts (map[MemberIndex]uint).
  - boundedRecorder: per-reason quota counter keeps each reason
    bucket independent so a peer cannot saturate one reason to
    mask another. Conflicts counter saturates at the conflict
    quota.
  - noOpRecorder: every category discards.
  - NewBoundedRecorderWithQuotas(overflow, reject, conflict)
    constructor for tests; existing NewBoundedRecorderWithQuota
    preserved for backward compat (defaults reject + conflict
    quotas).

* pkg/frost/roast/transition_message.go
  - RejectEntry (Sender + Reason + Count) and ConflictEntry
    (Sender + Count) wire types added.
  - LocalEvidenceSnapshot gains Rejects []RejectEntry and
    Conflicts []ConflictEntry, both omitempty.
  - NewLocalEvidenceSnapshot canonicalises into sorted slices:
    rejects ascending by Sender then by Reason; conflicts
    ascending by Sender.
  - Evidence() reconstructs the map form for downstream
    consumption.
  - Validate() enforces sorted-ascending invariants on both new
    slices.

* pkg/frost/roast/next_attempt.go
  - RejectExclusionThreshold = 1; ConflictExclusionThreshold = 1
    (per RFC-21 Layer B).
  - computeNextAttempt now consults rejectBlamedSenders and
    conflictBlamedSenders alongside the existing overflowBlamed
    set. All three feed into the permanent ExcludedSet.
  - blamedSenders helper factored to share the
    threshold-comparison + sort logic across the three category
    helpers.

* pkg/frost/signing/native_frost_protocol_frost_native.go and
* pkg/frost/signing/native_ffi_primitive_transitional_frost_native.go
  - Three reject sites: in each of the three receive loops, the
    shouldAcceptNativeFROSTMessage failure path now calls
    evidence.RecordReject(senderID, "validation_gate_rejected")
    before returning. (Previously the message was just dropped.)
  - Three conflict sites: the first-write-wins assembly loop's
    "dropping conflicting" branch now calls
    evidence.RecordConflict(senderID) immediately before the
    existing log line. (Previously only the log line.)

Tests (15 new cases):

* pkg/frost/roast/attempt/evidence_recorder_categories_test.go (7)
  - RecordReject accumulates by reason
  - RecordReject per-reason quota saturates
  - Per-reason quotas independent across reasons
  - RecordConflict accumulates and saturates
  - All three categories present in Snapshot after mixed input
  - NoOp recorder inert across all categories
  - RFC-quota constants match documented values

* pkg/frost/roast/next_attempt_categories_test.go (5)
  - Single reject crosses threshold -> permanent exclusion
  - Single conflict crosses threshold -> permanent exclusion
  - Reject and conflict on different senders -> both excluded
  - Empty rejects+conflicts -> no exclusion (sanity)
  - Threshold constants match RFC-21

* Receive-loop wiring is covered by existing send/recv tests
  combined with the recorder unit tests; no new behaviour test
  added at the integration level because the NoOp default keeps
  pre-RFC-21 receive semantics observably unchanged.

Verification:

* go build ./... + go build -tags 'frost_native frost_tbtc_signer
  frost_roast_retry' ./...  -- both clean
* go test ./pkg/frost/... + go test -race ./pkg/frost/roast/...
  + go test -tags 'frost_native frost_tbtc_signer
  frost_roast_retry' ./pkg/frost/... -- all pass (5 packages)
* staticcheck -checks '-SA1019' ./pkg/frost/... -- silent
* go vet ./pkg/frost/... + gofmt -l ./pkg/frost/ -- clean

This PR completes M4 from the original PR #3866 review. All four
ROAST evidence categories (overflow, reject, conflict, silence) are
now operational; the NextAttempt policy excludes on the first
three and parks transiently on the fourth, matching RFC-21
Layer B exactly.
mswilkison and others added 6 commits May 22, 2026 23:02
…ase-6 milestone)

Closes the Phase-6 milestone the RFC named but the
implementation skipped: receive callbacks now reject messages
whose AttemptContextHash does not match the session's bound
AttemptContext. Default builds and sessions without a ROAST-
attempt binding skip enforcement entirely, so the change is
observationally identical to pre-Phase-6 behaviour outside the
ROAST path.

The Phase 1B AttemptContextHash field was structural-only
(present, 32 bytes) until now. Senders could populate it but
receivers ignored the value -- meaning a peer could send a
message bound to attempt N to a receiver running attempt N+1 of
the same session and the receiver would accept it as long as
SessionID matched. This PR closes that gap.

* pkg/frost/signing/attempt_context_binding_validation_frost_native.go
  (new, gated frost_native)
  - attemptContextHashCarrier interface so the helper covers all
    three FROST/tbtc-signer message types via their existing
    GetAttemptContextHash methods.
  - verifyMessageAttemptContextHash: looks up the session's
    handle binding via currentAttemptHandleForCollect. No
    binding -> return nil (legacy / default build). Binding
    present + matching hash -> return nil. Binding present +
    missing hash -> ErrAttemptContextHashMissing. Binding
    present + mismatched hash -> ErrAttemptContextHashMismatch.

* pkg/frost/signing/native_frost_protocol_frost_native.go and
* pkg/frost/signing/native_ffi_primitive_transitional_frost_native.go
  Three receive callbacks updated. After the existing
  shouldAcceptNativeFROSTMessage gate, each callback now calls
  verifyMessageAttemptContextHash. Failure paths call
  evidence.RecordReject(senderID, "attempt_context_hash_mismatch")
  so the policy can permanently exclude peers that consistently
  send stale-attempt messages.

Tests:

* attempt_context_binding_validation_frost_native_test.go (gated
  frost_native && frost_roast_retry, 5 cases)
  - No binding -> any message passes
  - Binding + matching hash -> passes
  - Binding + missing hash -> ErrAttemptContextHashMissing
  - Binding + mismatched hash -> ErrAttemptContextHashMismatch
  - Integration with a real
    nativeFROSTRoundOneCommitmentMessage via SetAttemptContextHash;
    rebinding to a different context produces a mismatch

* attempt_context_binding_validation_default_build_test.go
  (gated frost_native && !frost_roast_retry, 1 case)
  - In the default build the helper always passes regardless of
    message contents, matching the rollback promise.

Verification:

* go build ./... + go build -tags 'frost_native frost_tbtc_signer
  frost_roast_retry' ./... -- both clean
* go test ./pkg/frost/... -- pass
* go test -tags 'frost_native frost_tbtc_signer' ./pkg/frost/... -- pass
* go test -tags 'frost_native frost_tbtc_signer frost_roast_retry'
  ./pkg/frost/... -- pass (5 packages)
* staticcheck -checks '-SA1019' ./pkg/frost/... -- silent
* gofmt -l ./pkg/frost/signing/ -- silent
* go vet ./pkg/frost/... -- clean

Migration path:

* Phase 1B (already shipped): AttemptContextHash is structurally
  validated when present, optional otherwise.
* This PR: the field is enforced *only* when the session has a
  ROAST-attempt binding. Sessions without a binding -- including
  every default-build session and every non-ROAST tagged-build
  session -- continue to ignore the field.
* Future PR: once production has rolled out a version that
  populates the field on every outbound message, enforcement can
  be made unconditional (binding-or-not).
Adds process-wide cumulative counters for the three evidence
categories (overflow / reject / conflict) and exposes them through
keep-core's clientinfo registry so operators can observe per-
category event rates via the standard Prometheus scrape.

The counters increment whenever a metrics-emitting recorder
records an event. In default builds and in unregistered-coordinator
states the recorder is NoOp, so the counters stay at zero.
Operators only see non-zero values once the ROAST-retry registry
is populated and live signing flows record evidence -- the
"do I have ROAST retry running?" smoke test.

* pkg/frost/signing/roast_retry_metrics.go (new, untagged)
  - Cumulative atomic counters: roastRetryOverflowEvents,
    roastRetryRejectEvents, roastRetryConflictEvents.
  - RegisterRoastRetryMetrics(*clientinfo.Registry) registers
    Source functions under the "frost_roast_retry" application
    prefix, producing metrics named:
      - frost_roast_retry_overflow_events_total
      - frost_roast_retry_reject_events_total
      - frost_roast_retry_conflict_events_total
    via the existing ObserveApplicationSource mechanism.
  - metricsEmittingRecorder wraps an attempt.EvidenceRecorder
    and bumps the matching counter on each Record* call before
    delegating to the inner recorder.
  - Nil-safe: a nil inner recorder collapses to NoOp; a nil
    clientinfo.Registry is a no-op registration.

* pkg/frost/signing/roast_retry_recorder.go (modified)
  - roastRetryRecorderForCollect now wraps the bounded recorder
    with newMetricsEmittingRecorder when the registry is
    populated. NoOp path is unchanged (no metrics emission).

Tests (6 cases in roast_retry_metrics_test.go):

* Counters increment on Record* (with different per-category counts).
* Snapshot delegates to the inner recorder.
* Nil inner falls back to NoOp without panicking.
* Unregistered coordinator -> NoOp recorder -> no counter bumps.
* Concurrent counter increments are race-safe.
* RegisterRoastRetryMetrics(nil) is a no-op (defensive guard).

Operator wiring:

The keep-core node's startup sequence should call
RegisterRoastRetryMetrics(&clientinfo.Registry) alongside the
existing registry observation calls. Documentation will be added
in a follow-up to the rollout guide
(docs/development/frost-roast-retry-rollout.adoc).

Verification:

* go build ./... -- clean
* go test ./pkg/frost/... -- pass (5 packages)
* go test -race ./pkg/frost/signing/... -- pass
* go test -tags 'frost_native frost_tbtc_signer frost_roast_retry'
  ./pkg/frost/... -- pass (5 packages)
* staticcheck -checks '-SA1019' ./pkg/frost/... -- silent
* go vet ./pkg/frost/... -- clean
* gofmt -l ./pkg/frost/signing/ -- silent

Stacked on the AttemptContextHash enforcement PR.
…3988)

## Summary

Closes the **M4 gap** from the original PR #3866 review by adding
the two evidence categories the RFC-21 Phase-2 work left as future
work: **validation-rejection evidence** and **first-write-wins-conflict
evidence**.

With this PR, the \`NextAttempt\` policy can permanently exclude
misbehaving peers on all four ROAST blame channels --
transport-overflow, validation-reject, equivocation-conflict, and
silence -- instead of just overflow + silence.

## Why this matters

A peer that only sends **malformed messages** (validation rejects,
never overflows the channel) was previously indistinguishable from
a silent peer. The transient silence-parking policy would
bench-and-reinstate them indefinitely, never permanently excluding
the malicious behaviour. Same for a peer **equivocating mid-attempt**:
the existing first-write-wins assembly correctly dropped the
conflicting retransmission but only logged the event -- the bundle
carried no structured evidence the coordinator's policy could act
on.

## What lands

### Recorder API

| Surface | Notes |
|---|---|
| \`RecordReject(sender, reason)\` | reason captured verbatim;
per-reason quota counter |
| \`RecordConflict(sender)\` | saturates at conflict quota |
| \`RejectQuotaDefault = 8\`, \`ConflictQuotaDefault = 4\` | matches
RFC-21 Layer A categoryQuota |
| Per-reason quotas independent | peer cannot saturate one reason to
mask another |

### Wire types

| Type | Sort order | Cap |
|---|---|---|
| \`RejectEntry{Sender, Reason, Count}\` | asc by Sender, then asc by
Reason | per-attempt evidence size bounded by Σ quotas |
| \`ConflictEntry{Sender, Count}\` | asc by Sender | per-attempt
evidence size bounded by Σ quotas |

Both fields use \`omitempty\` so pre-PR snapshots round-trip without
the new fields. \`Validate()\` enforces sorted-ascending invariants.

### NextAttempt policy

| Threshold | Value | Source |
|---|---|---|
| \`RejectExclusionThreshold\` | 1 | RFC-21 Layer B ("any non-transport
reject is sufficient cause") |
| \`ConflictExclusionThreshold\` | 1 | A single conflict is byzantine
evidence |

\`computeNextAttempt\` merges \`overflowBlamed\`, \`rejectBlamed\`,
\`conflictBlamed\` into the permanent ExcludedSet. The
\`blamedSenders\` helper is factored out so all three categories
share the deterministic sort + threshold-comparison logic.

### Receive-loop wiring

Three reject sites and three conflict sites updated across the two
files that house the three FROST/tbtc-signer receive loops:

| Site | Was | Now |
|---|---|---|
| \`shouldAcceptNativeFROSTMessage\` returns false | silent drop |
\`evidence.RecordReject(senderID, "validation_gate_rejected")\` + drop |
| First-write-wins conflict in assembly loop | warn log only |
\`evidence.RecordConflict(senderID)\` + warn log |

## Test coverage (15 new cases)

- 7 recorder tests: accumulation, per-reason quota saturation,
per-reason independence, conflict saturation, all-categories-present,
NoOp-inert, RFC-constant assertions
- 5 policy tests: single reject excludes, single conflict excludes,
reject+conflict on different senders, empty evidence (sanity),
threshold-constant assertions
- Receive-loop wiring is covered indirectly by the recorder unit tests;
the NoOp default keeps pre-RFC-21 receive semantics observably unchanged
so no integration-level test is required.

## Verification

| Command | Result |
|---|---|
| \`go build ./...\` + \`go build -tags 'frost_native frost_tbtc_signer
frost_roast_retry' ./...\` | both clean |
| \`go test ./pkg/frost/...\` + race | pass |
| \`go test -tags 'frost_native frost_tbtc_signer frost_roast_retry'
./pkg/frost/...\` | pass (5 packages) |
| \`staticcheck -checks '-SA1019' ./pkg/frost/...\` | silent |
| \`go vet ./pkg/frost/...\` + \`gofmt -l ./pkg/frost/\` | clean |

## RFC-21 status

With this PR, all four ROAST evidence categories are operational.
M4 from the original PR #3866 review is **fully closed**. The
keep-core code arc for RFC-21 is now feature-complete; remaining
work is operations-side (integration testnet, manifest flip).

## Test plan

- [ ] CI green.
- [ ] Reviewer confirms the per-reason quota independence is the right
semantics (alternative: single per-sender reject counter).
- [ ] Reviewer confirms threshold = 1 for both reject and conflict
(alternative: higher to absorb noise; trade-off is faster vs slower
exclusion of misbehaving peers).
…ase-6 milestone) (#3989)

## Summary

Closes the Phase-6 milestone the RFC named but the implementation
skipped: receive callbacks now reject messages whose
\`AttemptContextHash\` does not match the session's bound
\`AttemptContext\`. Default builds and sessions without a
ROAST-attempt binding skip enforcement entirely, so the change
is observationally identical to pre-Phase-6 behaviour outside
the ROAST path.

Stacked on #3988 (M4 closure).

## Why this matters

The Phase 1B \`AttemptContextHash\` field was structural-only
(present, 32 bytes) until now. Senders could populate it but
receivers ignored the value -- meaning a peer could send a
message bound to attempt N to a receiver running attempt N+1 of
the same session and the receiver would accept it as long as
\`SessionID\` matched. This PR closes that gap.

## What lands

| Surface | Behaviour |
|---|---|
| \`verifyMessageAttemptContextHash(msg, sessionID)\` | No binding →
pass (legacy/default). Binding + matching hash → pass. Binding + missing
hash → \`ErrAttemptContextHashMissing\`. Binding + mismatch →
\`ErrAttemptContextHashMismatch\`. |
| \`attemptContextHashCarrier\` interface | One implementation covers
all three FROST/tbtc-signer message types via their existing
\`GetAttemptContextHash\` methods. |
| 3 receive callbacks updated | After
\`shouldAcceptNativeFROSTMessage\`, call
\`verifyMessageAttemptContextHash\`. Failure →
\`evidence.RecordReject(senderID, "attempt_context_hash_mismatch")\` so
the policy can permanently exclude peers that consistently send
stale-attempt messages. |

## Test coverage

| File | Build | Cases |
|---|---|---|
| \`attempt_context_binding_validation_frost_native_test.go\` |
\`frost_native && frost_roast_retry\` | 5 (no-binding, matching,
missing, mismatch, real-message integration with rebind) |
| \`attempt_context_binding_validation_default_build_test.go\` |
\`frost_native && !frost_roast_retry\` | 1 (default build always passes;
rollback promise upheld) |

## Migration path

1. **Phase 1B (already shipped):** field structurally validated when
present, optional otherwise.
2. **This PR:** enforced *only* when the session has a ROAST-attempt
binding. Default builds and non-ROAST tagged sessions continue to ignore
the field.
3. **Future PR:** once production has rolled out a version that
populates the field on every outbound message, enforcement can be made
unconditional.

## Verification

| Command | Result |
|---|---|
| \`go build ./...\` + tagged | both clean |
| \`go test ./pkg/frost/...\` | pass |
| \`go test -tags 'frost_native frost_tbtc_signer'\` | pass |
| \`go test -tags 'frost_native frost_tbtc_signer frost_roast_retry'\` |
pass |
| \`staticcheck -checks '-SA1019' ./pkg/frost/...\` | silent |
| \`go vet ./pkg/frost/...\` | clean |
| \`gofmt -l ./pkg/frost/signing/\` | silent |

## Test plan

- [ ] CI green.
- [ ] Reviewer confirms the "no binding = skip enforcement" gate is
acceptable (alternative: always enforce when build tag set, regardless
of binding -- riskier during transitions).
- [ ] Reviewer confirms the failure-mode rejects record evidence rather
than just dropping (so misbehaving peers accumulate exclusion-worthy
counts).
…nfo (#3990)

## Summary

Process-wide cumulative counters for the three evidence categories
(overflow / reject / conflict), exposed through keep-core's
\`clientinfo\` registry so operators can observe per-category event
rates via the standard Prometheus scrape.

In default builds and unregistered-coordinator states, the
metrics-emitting recorder is bypassed entirely (the receive loops
use \`attempt.NoOpRecorder\`), so the counters stay at zero. Once
the ROAST-retry registry is populated and live signing flows
record evidence, the counters increment -- providing the
"do I have ROAST retry running?" smoke test from operator
dashboards.

Stacked on #3989 (AttemptContextHash enforcement).

## What lands

| File | Role |
|---|---|
| \`roast_retry_metrics.go\` (new, untagged) | Cumulative atomic
counters; \`RegisterRoastRetryMetrics(*clientinfo.Registry)\` registers
Source functions under the \`frost_roast_retry\` application prefix;
\`metricsEmittingRecorder\` wraps the bounded recorder and bumps the
counter on each Record* call. |
| \`roast_retry_recorder.go\` (modified) |
\`roastRetryRecorderForCollect\` now wraps the bounded recorder with
\`newMetricsEmittingRecorder\` when the registry is populated. |

## Metrics exposed

Via \`clientinfo.Registry.ObserveApplicationSource\`:

| Metric name | Description |
|---|---|
| \`frost_roast_retry_overflow_events_total\` | Cumulative count of
receive-channel overflow events |
| \`frost_roast_retry_reject_events_total\` | Cumulative count of
validation-gate rejections (incl. \`attempt_context_hash_mismatch\` from
#3989) |
| \`frost_roast_retry_conflict_events_total\` | Cumulative count of
first-write-wins equivocation events |

## Test coverage (6 cases)

- Counters increment on `Record*` (different per-category counts)
- Snapshot delegates to inner recorder
- Nil inner falls back to NoOp without panicking
- Unregistered coordinator → NoOp recorder → no counter bumps
- Concurrent counter increments are race-safe (16 workers × 100 calls)
- `RegisterRoastRetryMetrics(nil)` is a no-op (defensive guard)

## Operator wiring

The keep-core node's startup sequence should call:

\`\`\`go
signing.RegisterRoastRetryMetrics(clientinfoRegistry)
\`\`\`

alongside the existing registry observation calls. A follow-up to
\`docs/development/frost-roast-retry-rollout.adoc\` will document
this step.

## Verification

| Command | Result |
|---|---|
| \`go build ./...\` | clean |
| \`go test ./pkg/frost/...\` | pass (5 packages) |
| \`go test -race ./pkg/frost/signing/...\` | pass |
| \`go test -tags 'frost_native frost_tbtc_signer frost_roast_retry'
./pkg/frost/...\` | pass |
| \`staticcheck -checks '-SA1019' ./pkg/frost/...\` | silent |
| \`go vet ./pkg/frost/...\` | clean |
| \`gofmt -l ./pkg/frost/signing/\` | silent |

## Test plan

- [ ] CI green.
- [ ] Reviewer confirms the process-wide cumulative counter shape
(alternative: per-session gauges, more granular but harder to query at a
glance).
- [ ] Reviewer confirms the \`frost_roast_retry\` application prefix is
acceptable (alternative: more specific prefix like
\`frost_roast_retry_evidence\`).
Adds a top-of-file design-rationale block to roast_retry_orchestration.go
that captures the load-bearing decision (from RFC-21 Phase 6 review)
about which orchestration errors are fallback-eligible and which must
hard-fail.

The decision had been distributed across commit messages, the RFC text,
and inline comments on individual sentinel definitions. The
block centralises it next to the code that enforces it, so future
maintainers can find the rationale without having to reconstruct it
from spelunking history.

Key statements captured:

  STATIC errors  -> safe to fall back to the legacy retry path. Every
                    honest signer observes the same node-local config
                    at startup so fallback decisions are deterministic
                    across the group. Sentinel:
                    ErrNoRoastRetryCoordinatorRegistered, detected via
                    errors.Is in signing_loop_roast_dispatcher.go.

  RUNTIME errors -> HARD FAIL. Per-attempt protocol state errors can be
                    observed by some participants and not others within
                    the same attempt; falling back to legacy under those
                    conditions creates split-brain (some operators
                    running new code, others running legacy on the same
                    attempt). The orchestration layer returns these as
                    bare errors that the dispatcher treats as terminal.

The block also notes the historical redirect: the earlier design had
BeginAttempt failures fall back, on the assumption that BeginAttempt
was cheap idempotent setup. Review identified BeginAttempt mutates
per-attempt state and can fail from races with concurrent receives,
which the static-error fallback can't safely handle. Documenting the
"why" prevents the regression from being re-introduced by a maintainer
who reads only the code.

Pure documentation -- no behaviour change, no test changes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
mswilkison and others added 5 commits May 23, 2026 12:08
Mirrors the cross-repo HASH160-based wallet pubkey hash derivation
fixture from the tbtc bridge repo (companion PR
tlabs-xyz/tbtc#432, fixture at
docs/test-vectors/wallet-pubkey-hash-derivation-vectors-v1.json).

The byte-identical JSON is checked into both repos; each side's test
reads its local copy and asserts its own derivation function
reproduces the expected output. If frost.WalletPublicKeyHashCompat-
ibilityAlias drifts from BitcoinTx.deriveWalletPubKeyHashFromXOnly on
the bridge side, at least one repo's test fails.

The drift this catches is the silent killer for the bridge-protocol
identity contract: if keep-core derives a different 20-byte alias
than the bridge for the same input, FROST wallets registered by the
DKG coordinator land at addresses the bridge doesn't recognize, or
vice versa. The failure mode is invisible until a wallet is actually
created in production.

Test cases

  TestFrostWalletPubKeyHashDerivationVectors
    Asserts frost.WalletPublicKeyHashCompatibilityAlias produces the
    expected 20-byte alias for every FROST vector
    (HASH160(0x02 || xOnlyOutputKey)).

  TestEcdsaCompressedPubKeyHash160Vectors
    Asserts HASH160 of the compressed pubkey matches the expected
    value for every ECDSA vector. The bridge performs this derivation
    implicitly during registerNewWallet (compress then hash160); this
    test pins the algorithm on the keep-core side using the same
    vectors the bridge pins on its side.

  TestDriftCheckMetadata
    Pins the drift_check.tbtc_path / drift_check.keep_core_path /
    drift_check.rule fields, so a future cross-repo CI sync check
    has stable references.

  TestFixtureFileShouldExistAtMirrorPath
    Documents the convention that the file lives at the path the
    fixture self-declares; a nudge for anyone moving the file.

Companion PR

  tlabs-xyz/tbtc#432 lands the same JSON fixture and the bridge-side
  test against TestBitcoinTx.deriveWalletPubKeyHashFromXOnly. Both PRs
  ship together; landing only one provides no drift protection.

Lineage

  Surfaced in the cross-PR review re-evaluation, originally flagged
  as "Cross-repo walletID derivation test fixture -- separate effort."
  Priority was raised when the Phase B-2 keep-core DKG coordination
  protocol became part of the active roadmap; that phase produces
  walletIDs the bridge must accept, and this fixture validates the
  contract before B-2's implementation goes through end-to-end
  testing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The derivationFixture type declaration had one extra space between
each field name and its type, which gofmt's alignment rule rejects.
The five fields share the same column-aligned formatting; trim the
extra space per field so `gofmt -l .` returns clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…file via runtime.Caller

Codex round-2 validation caught that the previous
walletPubKeyHashDerivationVectorsPath constant equalled "testdata/..."
(package-relative, used by os.ReadFile because go test runs with the
package dir as cwd), but the fixture's drift_check.keep_core_path
declares "pkg/frost/testdata/..." (repo-root-relative, the canonical
location for cross-repo sync tooling).

These two paths refer to the same file but are intentionally
different representations. TestDriftCheckMetadata compared them
directly via != and failed unconditionally; TestFixtureFileShouldExist-
AtMirrorPath called filepath.Abs(fixture.DriftCheck.KeepCorePath)
from the package cwd and stat'd pkg/frost/pkg/frost/testdata/... which
doesn't exist.

Fix

  - Split the constant into two named pairs that document each
    convention:
      walletPubKeyHashDerivationVectorsTestPath = "testdata/..."
        (package-relative, for os.ReadFile from the test's cwd)
      walletPubKeyHashDerivationVectorsRepoPath = "pkg/frost/testdata/..."
        (repo-root-relative, matches the fixture metadata)

  - TestDriftCheckMetadata: assert against the repo-relative constant,
    not the package-relative one. Now compares apples to apples.

  - TestFixtureFileShouldExistAtMirrorPath: resolve
    fixture.DriftCheck.KeepCorePath relative to the repo root,
    obtained by walking two directories up from this test file's
    location via runtime.Caller(0). This stat's the right path
    regardless of which cwd `go test` ran with.

Local verification

   go test ./pkg/frost -run "PubKeyHash|DriftCheck|FixtureFile" -v

   ...
   --- PASS: TestFrostWalletPubKeyHashDerivationVectors (0.00s)
   --- PASS: TestEcdsaCompressedPubKeyHash160Vectors (0.00s)
   --- PASS: TestDriftCheckMetadata (0.00s)
   --- PASS: TestFixtureFileShouldExistAtMirrorPath (0.00s)
   PASS
   ok  	github.com/keep-network/keep-core/pkg/frost  0.225s

gofmt clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
#3993)

## Why

The RFC-21 Phase 6 review decided which orchestration errors are
fallback-eligible (static config errors → safe to fall back to legacy
retry path) and which must hard-fail (runtime per-attempt errors → no
fallback, since per-participant divergence creates split-brain group
fracture). The rationale lived in commit messages, the RFC text, and
inline comments on individual sentinels — distributed enough that a
future maintainer reading just \`roast_retry_orchestration.go\` could
miss the load-bearing constraint.

This PR adds a top-of-file design-rationale block that centralises the
decision in the place that enforces it.

## What changed

- One file changed: \`pkg/frost/signing/roast_retry_orchestration.go\`
- Pure documentation: no behavior change, no test changes, no API change
- 49 lines added (one comment block)

## What it captures

1. **STATIC vs RUNTIME classification** — explicit definitions, with the
sentinel (\`ErrNoRoastRetryCoordinatorRegistered\`) and detection
mechanism (\`errors.Is\` in \`signing_loop_roast_dispatcher.go\`) named.
2. **Why static-error fallback is safe** — every honest signer observes
the same node-local config at startup, so the fallback decision is
deterministic across the group.
3. **Why runtime-error fallback is unsafe** — per-attempt protocol state
errors can be observed by some participants and not others within the
same attempt; fallback would put some operators on new code and others
on legacy for the same attempt.
4. **Enforcement rule** — any error surfaced from this package that is
intended to permit fallback MUST be the sentinel; wrapping ANY runtime
error in the sentinel is a safety regression that PR reviewers should
reject.
5. **Historical redirect** — the earlier design had \`BeginAttempt\`
failures fall back, on the assumption that BeginAttempt was cheap
idempotent setup. Review identified that BeginAttempt mutates
per-attempt state and can fail from races with concurrent receives; the
taxonomy was tightened so only true configuration errors are
fallback-eligible.

## Lineage

Surfaced in the cross-PR review re-evaluation following PR #3866
follow-up landings. Originally tracked as "Document static-vs-runtime
classification canonically" — initially flagged as "available if you
want," now elevated because the rationale was the most important
architectural decision in the RFC-21 stack and is currently the easiest
piece of design context to lose.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
## Why

The bridge contracts (tlabs-xyz/tbtc) and the keep-core FROST protocol
(this repo) both derive the 20-byte walletPubKeyHash from cryptographic
inputs:

- **Bridge** (\`BitcoinTx.deriveWalletPubKeyHashFromXOnly\`) computes
\`bytes20(HASH160(0x02 || xOnlyKey))\` for FROST wallets, and HASH160 of
the compressed ECDSA pubkey for legacy wallets.
- **keep-core** (\`frost.WalletPublicKeyHashCompatibilityAlias\` in
\`pkg/frost/types.go\`) computes the same HASH160(0x02 || outputKey).

Drift between these two derivations silently breaks the bridge-protocol
identity contract for any wallet whose canonical identity is established
cross-repo. Today the two agree (both invoke HASH160 the same way) — but
there's no enforcement, and a future refactor on either side could
introduce silent drift that only surfaces when production traffic
doesn't match.

The Phase B-2 work in this repo (the FROST DKG coordination protocol)
will produce walletIDs that the bridge must accept. Pre-staging the
fixture now lets B-2's implementation be validated as it's built.

## What

Mirrors the cross-repo fixture and adds a Go test that consumes it.

The fixture lives byte-identically in both repos:
- tbtc:
\`docs/test-vectors/wallet-pubkey-hash-derivation-vectors-v1.json\`
(companion PR
[tlabs-xyz/tbtc#432](tlabs-xyz/tbtc#432))
- keep-core:
\`pkg/frost/testdata/wallet-pubkey-hash-derivation-vectors-v1.json\`
(this PR)

Each repo's test reads its local copy and asserts its own derivation
function reproduces the expected output. If either side drifts, at least
one repo's test fails.

## Test cases

- **\`TestFrostWalletPubKeyHashDerivationVectors\`** — asserts
\`frost.WalletPublicKeyHashCompatibilityAlias\` produces the expected
20-byte alias for every FROST vector (HASH160(0x02 || xOnlyOutputKey)).
- **\`TestEcdsaCompressedPubKeyHash160Vectors\`** — asserts HASH160 of
the compressed pubkey matches expected for every ECDSA vector. The
bridge performs this implicitly during \`registerNewWallet\` (compress
then HASH160); this test pins the algorithm on the keep-core side.
- **\`TestDriftCheckMetadata\`** — pins the \`drift_check.tbtc_path\` /
\`drift_check.keep_core_path\` / \`drift_check.rule\` fields so a future
cross-repo CI sync check has stable references.
- **\`TestFixtureFileShouldExistAtMirrorPath\`** — documents the
convention that the file lives at the path the fixture self-declares.

## Vectors

**ECDSA legacy** (HASH160 of compressed pubkey):
- secp256k1 generator point compressed (well-known Bitcoin vector —
produces 1BvBMSEYstWetqTFn5Au4m4GFg7xJaNVN2)
- Near-zero scalar pubkey
- The tBTC ECDSA test fixture's pubkey (cross-validates against the
bridge-side \`ecdsaWalletTestData.pubKeyHash160\` constant)

**FROST P2TR** (HASH160(0x02 || xOnlyOutputKey)):
- Representative key with non-zero high 12 bytes (matches the
native-shape constraint on the FROST registration entry point from
tlabs-xyz/tbtc#431)
- All-ones x-only key (regression)
- All-max x-only key (boundary)

## Why "fixture lives in both repos" instead of a shared submodule

Considered but rejected:
- **Git submodule** — adds tooling complexity for a 3 KB JSON file
- **Single-source-of-truth repo** — requires bootstrapping a third repo
or vendoring; both options add coordination friction
- **Dual-checked file** — simplest; the \`drift_check\` metadata
captures the rule, and a future small CI job can hash-compare the two
files

The dual-checked approach trades a slightly weaker guarantee (file drift
can happen between repos if no one notices) for a much smaller
operational footprint. The test-level drift (different derivation
algorithms producing different outputs from the same input) is the
load-bearing failure mode and is fully covered.

## Companion PR

[tlabs-xyz/tbtc#432](tlabs-xyz/tbtc#432) lands
the same JSON fixture and a TypeScript/Hardhat test against
\`TestBitcoinTx.deriveWalletPubKeyHashFromXOnly\` and the off-chain
HASH160 path. Both PRs are intended to land together — landing only one
provides no drift protection.

## Lineage

Surfaced in the cross-PR review re-evaluation, originally flagged as
"Cross-repo walletID derivation test fixture — separate effort."
Priority was raised when Phase B-2 (this repo's FROST DKG coordinator)
became part of the active roadmap.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant